128 research outputs found
Enabling Interactive Analytics of Secure Data using Cloud Kotta
Research, especially in the social sciences and humanities, is increasingly
reliant on the application of data science methods to analyze large amounts of
(often private) data. Secure data enclaves provide a solution for managing and
analyzing private data. However, such enclaves do not readily support discovery
science---a form of exploratory or interactive analysis by which researchers
execute a range of (sometimes large) analyses in an iterative and collaborative
manner. The batch computing model offered by many data enclaves is well suited
to executing large compute tasks; however it is far from ideal for day-to-day
discovery science. As researchers must submit jobs to queues and wait for
results, the high latencies inherent in queue-based, batch computing systems
hinder interactive analysis. In this paper we describe how we have augmented
the Cloud Kotta secure data enclave to support collaborative and interactive
analysis of sensitive data. Our model uses Jupyter notebooks as a flexible
analysis environment and Python language constructs to support the execution of
arbitrary functions on private data within this secure framework.Comment: To appear in Proceedings of Workshop on Scientific Cloud Computing,
Washington, DC USA, June 2017 (ScienceCloud 2017), 7 page
nelli: a lightweight frontend for MLIR
Multi-Level Intermediate Representation (MLIR) is a novel compiler
infrastructure that aims to provide modular and extensible components to
facilitate building domain specific compilers. However, since MLIR models
programs at an intermediate level of abstraction, and most extant frontends are
at a very high level of abstraction, the semantics and mechanics of the
fundamental transformations available in MLIR are difficult to investigate and
employ in and of themselves. To address these challenges, we have developed
\texttt{nelli}, a lightweight, Python-embedded, domain-specific, language for
generating MLIR code. \texttt{nelli} leverages existing MLIR infrastructure to
develop Pythonic syntax and semantics for various MLIR features. We describe
\texttt{nelli}'s design goals, discuss key details of our implementation, and
demonstrate how \texttt{nelli} enables easily defining and lowering compute
kernels to diverse hardware platforms
DRIVE: A Distributed Economic Meta-Scheduler for the Federation of Grid and Cloud Systems
The computational landscape is littered with islands of disjoint resource providers including
commercial Clouds, private Clouds, national Grids, institutional Grids, clusters, and data centers.
These providers are independent and isolated due to a lack of communication and coordination,
they are also often proprietary without standardised interfaces, protocols, or execution environments.
The lack of standardisation and global transparency has the effect of binding consumers
to individual providers. With the increasing ubiquity of computation providers there is an opportunity
to create federated architectures that span both Grid and Cloud computing providers
effectively creating a global computing infrastructure. In order to realise this vision, secure and
scalable mechanisms to coordinate resource access are required. This thesis proposes a generic
meta-scheduling architecture to facilitate federated resource allocation in which users can provision
resources from a range of heterogeneous (service) providers.
Efficient resource allocation is difficult in large scale distributed environments due to the inherent
lack of centralised control. In a Grid model, local resource managers govern access to a
pool of resources within a single administrative domain but have only a local view of the Grid
and are unable to collaborate when allocating jobs. Meta-schedulers act at a higher level able to
submit jobs to multiple resource managers, however they are most often deployed on a per-client
basis and are therefore concerned with only their allocations, essentially competing against one
another. In a federated environment the widespread adoption of utility computing models seen in
commercial Cloud providers has re-motivated the need for economically aware meta-schedulers.
Economies provide a way to represent the different goals and strategies that exist in a competitive
distributed environment. The use of economic allocation principles effectively creates an
open service market that provides efficient allocation and incentives for participation.
The major contributions of this thesis are the architecture and prototype implementation of the
DRIVE meta-scheduler. DRIVE is a Virtual Organisation (VO) based distributed economic metascheduler
in which members of the VO collaboratively allocate services or resources. Providers
joining the VO contribute obligation services to the VO. These contributed services are in effect
membership “dues” and are used in the running of the VOs operations – for example allocation,
advertising, and general management. DRIVE is independent from a particular class of provider
(Service, Grid, or Cloud) or specific economic protocol. This independence enables allocation in
federated environments composed of heterogeneous providers in vastly different scenarios. Protocol
independence facilitates the use of arbitrary protocols based on specific requirements and
infrastructural availability. For instance, within a single organisation where internal trust exists,
users can achieve maximum allocation performance by choosing a simple economic protocol.
In a global utility Grid no such trust exists. The same meta-scheduler architecture can be used
with a secure protocol which ensures the allocation is carried out fairly in the absence of trust.
DRIVE establishes contracts between participants as the result of allocation. A contract describes
individual requirements and obligations of each party. A unique two stage contract negotiation
protocol is used to minimise the effect of allocation latency. In addition due to the co-op nature of
the architecture and the use of secure privacy preserving protocols, DRIVE can be deployed in a
distributed environment without requiring large scale dedicated resources.
This thesis presents several other contributions related to meta-scheduling and open service
markets. To overcome the perceived performance limitations of economic systems four high utilisation
strategies have been developed and evaluated. Each strategy is shown to improve occupancy,
utilisation and profit using synthetic workloads based on a production Grid trace. The
gRAVI service wrapping toolkit is presented to address the difficulty web enabling existing applications.
The gRAVI toolkit has been extended for this thesis such that it creates economically
aware (DRIVE-enabled) services that can be transparently traded in a DRIVE market without requiring
developer input. The final contribution of this thesis is the definition and architecture of
a Social Cloud – a dynamic Cloud computing infrastructure composed of virtualised resources
contributed by members of a Social network. The Social Cloud prototype is based on DRIVE
and highlights the ease in which dynamic DRIVE markets can be created and used in different
domains
Capturing the "Whole Tale" of Computational Research: Reproducibility in Computing Environments
We present an overview of the recently funded "Merging Science and
Cyberinfrastructure Pathways: The Whole Tale" project (NSF award #1541450). Our
approach has two nested goals: 1) deliver an environment that enables
researchers to create a complete narrative of the research process including
exposure of the data-to-publication lifecycle, and 2) systematically and
persistently link research publications to their associated digital scholarly
objects such as the data, code, and workflows. To enable this, Whole Tale will
create an environment where researchers can collaborate on data, workspaces,
and workflows and then publish them for future adoption or modification.
Published data and applications will be consumed either directly by users using
the Whole Tale environment or can be integrated into existing or future domain
Science Gateways
A Social Cloud for Public eResearch
Abstract—Scientific researchers faced with extremely large computations or the requirement of storing vast quantities of data have come to rely on distributed computational models like cloud computing. However, distributed computation is typically complex and expensive. The Social Cloud for Public eResearch aims to provide researchers with a platform to exploit social networks to reach out to users who would otherwise be unlikely to donate computational time for scientific and other research oriented projects. In this paper we explore the motivations of users to contribute computational time and examine the various ways these motivations can be catered to through established social networks. We specifically look at integrating Facebook and BOINC, and discuss the architecture of the functional system and the novel social engineering algorithms that power it. I
Greedy nominator heuristic: virtual function placement on fog resources
Fog computing is an intermediate infrastructure between edge devices (e.g., Internet of Things) and cloud systems that is used to reduce latency in real-time applications. An application can be composed of a collection of virtual functions, between which dependency constraints can be captured in a service function chain (SFC). Virtual functions within an SFC can be executed at different geo-distributed locations. However, virtual functions are prone to failure and often do not complete within a deadline. This results in function reallocation to other nodes within the infrastructure; causing delays, potential data loss during function migration, and increased costs. We proposed Greedy Nominator Heuristic (GNH) to address these issues. GNH is based on redundant deployment and failure tracking of virtual functions. GNH places replicas of each function at multiple locations—taking account of expected completion time, failure risk, and cost. We make use of a MapReduce-based mechanism, where Mappers find suitable locations in parallel, and a Reducer then ranks these locations. Our results show that GNH reduces latency by up to 68%, and is more cost effective than other approaches which rely on state-of-the-art optimization algorithms to allocate replicas
- …